Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single AZ Support #93

Merged
merged 4 commits into from
Apr 27, 2022
Merged

Single AZ Support #93

merged 4 commits into from
Apr 27, 2022

Conversation

nickumia-reisys
Copy link
Contributor

This allows users to specify that the entire workload should be run within a single AWS Availability Zone for latency or other operational requirements.

New Additions:

  • Restrict Managed Node Group Node Creation to a single availability zone if single_az = true
  • Change volume binding for storage class to "WaitForFirstConsumer" to ensure it can be attached to nodes with the correct topology (specifically, within the same availability zone)

This is not the most optimal solution (primarily just a quick workaround). Things that must be considered long-term is auto-scaling specification to ensure new nodes are only created within the same existing availability zone and there aren't any other edge cases to consider. By forcing the managed node group to live within a single subnet, it forces nodes to be assigned solely to that subnet's availability zone.

References for creating an optional nested block:

Additional Background surrounding Storage Classes:

A cluster administrator can address this issue by specifying the WaitForFirstConsumer mode which will delay the binding and provisioning of a PersistentVolume until a Pod using the PersistentVolumeClaim is created. PersistentVolumes will be selected or provisioned conforming to the topology that is specified by the Pod's scheduling constraints. These include, but are not limited to, resource requirements, node selectors, pod affinity and anti-affinity, and taints and tolerations.

If a PV is scheduled in AZ A and a pod requires that PV and there are no available nodes in AZ A because new pods were higher in the scheduling queue. The pod will be stuck in a pending state until a node becomes available in AZ A where it's already provisioned volume exists. By setting "WaitForFirstConsumer " on the storage class, the PV won't be provisioned until the pod is scheduled and it's state is known to create a volume compatible with it.

Some workloads have latency or other technical requirements that force pods to be within the same availability zone.  This enables the managed node group to exist in only one availability zone
@nickumia-reisys nickumia-reisys requested a review from a team April 26, 2022 20:41
@FuhuXia
Copy link
Contributor

FuhuXia commented Apr 26, 2022

Is there any way to verify each node's AZ before and after the change?

@nickumia-reisys
Copy link
Contributor Author

@FuhuXia There is a way to tell which nodes are in which AZ, but there isn't really a notion of "before" and "after" this change. If you had an existing cluster manually deployed with terraform apply and then re-apply with the new option of single_az=true, then you can check either the AWS Console or kubectl describe nodes to see where the nodes are. If this cluster was deployed through a Broker, you could do the same thing, but it would just be another layer of abstraction to workaround. You could inspect the cluster and then try to take advantage of GSA/data.gov#3083 to try and upgrade an instance and then inspect it after. However, there isn't an automated way to inventory it right now. The best would be a command like the following,

nickumia@DL62-2-2MDD043:~/eks-brokerpak/terraform/modules/provision-aws$ kubectl describe node | grep "\(Name:\|topology.kubernetes.io/zone\)"
Name:               ***.compute.internal
                    topology.kubernetes.io/zone=us-west-2a
Name:               ***.compute.internal
                    topology.kubernetes.io/zone=us-west-2a
Name:               ***.compute.internal
                    topology.kubernetes.io/zone=us-west-2a
Name:               ***.compute.internal
                    topology.kubernetes.io/zone=us-west-2a
Name:               ***.compute.internal
                    topology.kubernetes.io/zone=us-west-2a
Name:               ***.compute.internal
                    topology.kubernetes.io/zone=us-west-2a

@mogul mogul enabled auto-merge (squash) April 27, 2022 21:33
@mogul mogul merged commit 52648ed into main Apr 27, 2022
@mogul mogul deleted the single-az-support branch April 27, 2022 21:58
@mogul
Copy link
Collaborator

mogul commented May 2, 2022

Noting here since it will probably get the right eyes: Karpenter added pod affinity support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants